AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
Datasets
EN

AI News

View More

ZhiYuan Releases the World's Largest Chinese-English Semantic Vector Model Training Dataset MTP

The ZhiYuan Research Institute has released the world's largest Chinese-English semantic vector model training dataset, MTP, with a data scale of 300 million pairs. MTP is the largest open-source dataset of Chinese-English related text pairs, providing an important foundation for training semantic vector models. The dataset includes Chinese-English text pairs from multiple sources, covering various types such as Q&A, comments, and news. The ZhiYuan Research Institute stated that this data plays a crucial role in training large models and will promote collaborative innovation in artificial intelligence. The release of this dataset is expected to address the shortage of training datasets for Chinese models.

7.6k 18 hours ago
ZhiYuan Releases the World's Largest Chinese-English Semantic Vector Model Training Dataset MTP
AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2025AIBase
Business CooperationSite Map